Using Compression to Identify Classes of Inauthentic Texts

نویسندگان

  • Mehmet M. Dalkilic
  • Wyatt Travis Clark
  • James C. Costello
  • Predrag Radivojac
چکیده

Recent events have made it clear that some kinds of technical texts, generated by machine and essentially meaningless, can be confused with authentic, technical texts written by humans. We identify this as a potential problem, since no existing systems for, say the web, can or do discriminate on this basis. We believe that there are subtle, shortand longrange word or even string co-occurrences extant in human texts, but not in many classes of computer generated texts, that can be used to discriminate based on meaning. In this paper we employ the universal lossless source coding algorithms to generate features in a high-dimensional space and then apply support vector machines to discriminate between the classes of authentic and inauthentic texts. Compression profiles for the two kinds of text are distinct— the authentic texts being bounded by various classes of more compressible or less compressible texts that are computer generated. This in turn led to the high prediction accuracy of our models which support our conjecture that there exists a relationship between meaning and compressibility. Our results show that the learning algorithm based upon the compression profile outperformed standard term-frequency text categorization schemes on several non-trivial classes of

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EFL Students’ Views on L1 Culture through English Texts with L1 and L2 Cultural Content

The present research aims to examine the effects of using culturally oriented texts in project-based classes on the views of EFL university students regarding their L1 culture. To this end, three experimental groups of intermediate EFL freshmen, assigned to classes A, B and C, participated in this study. Each of the classes were presented with reading passages focused on L1 culture, L2 culture,...

متن کامل

The Efficiency of the Sunnī Ḥadīth Criticism System as Argued by the Qur’ānists and Traditionists

Along with proving the sufficiency of the Qur’ān for the extraction of the religious knowledge, the Qur’ānists have doubted the authenticity and the authoritativeness of the sunna and Ḥadīth. One of their reasons for the inauthenticity of the sunna is the inefficiency of the Ḥadīth evaluation and criticism system. Using a descriptive-analytical method, the present study first analyzes their mai...

متن کامل

Language Features of Russian Texts of Engineering Discourse

The Article is devoted to the applied problem of identifying the linguistic features of engineering texts. The study of Russian-language texts of engineering discourse is usually of an applied nature, in our case, this applied research is caused by the need to teach foreigners who receive professional engineering education in Russia and in Russian language. The object of the research is the Rus...

متن کامل

The Effect of Reducing Lexical and Syntactic Complexity of Texts on Reading Comprehension

The present study investigated the effect of different types of text simplification (i.e., reducing the lexical and syntactic complexity of texts) on reading comprehension of English as a Foreign Language learners (EFL). Sixty female intermediate EFL learners from three intact classes in Tabarestan Language Institute in Tehran participated in the study. The intact classes were assigned to three...

متن کامل

Effects of Different Culturally-Based Materials on EFL Learners’ Reading Anxiety, Reading Self-Efficacy, and Reading Proficiency in Project-Based Classes

This article sets out to examine the effect of utilizing different culturally-based materials on EFL university students' foreign language reading anxiety, reading comprehension self-efficacy, and reading proficiency within project-based classes. The research was carried out with two classes of intermediate freshmen majoring in English Language Teaching. The comparison group had to present thei...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006